Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis
نویسندگان
چکیده
We describe experiments in building HMM text-to-speech voices on professional broadcast news data from multiple speakers. We build on earlier work comparing techniques for selecting utterances from the corpus and voice adaptation to produce the most natural-sounding voices. While our ultimate goal is to develop intelligible and natural-sounding synthetic voices in low-resource languages rapidly and without the expense of collecting and annotating data specifically for text-to-speech, we focus on English initially, in order to develop and evaluate our methods. We evaluate our approaches using crowdsourced listening tests for naturalness. We have found that removing utterances that are outliers with respect to hyper-articulation, as well as combining the selection of hypoarticulated utterances and low mean f0 utterances, produce the most natural-sounding voices.
منابع مشابه
Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملData Selection for Naturalness in HMM-based Speech Synthesis
We describe experiments in training HMM text-to-speech voices on professional broadcast news data from multiple speakers. We compare data selection techniques designed to identify the best utterances for voice training in a corpus not explicitly recorded for synthesis, aiming to select utterances from the corpus which will produce the most natural-sounding voices. We also explore different meth...
متن کاملA hybrid TTS between unit selection and HMM-based TTS under limited data conditions
The intelligibility of HMM-based TTS can reach that of the original speech. However, HMM-based TTS is far from natural. On the contrary, unit selection TTS is the most-natural sounding TTS currently. However, its intelligibility and naturalness on segmental duration and timing are not stable. Additionally, unit selection needs to store a huge amount of data for concatenation. Recently, hybrid a...
متن کاملHMM-based polyglot speech synthesis by speaker and language adaptive training
This paper describes a technique for speaker and language adaptive training (SLAT) for HMM-based polyglot speech synthesis and its evaluations on a multi-lingual speech corpus. The SLAT technique allows multi-speaker/multi-language adaptive training and synthesis to be performed. Experimental results show that the SLAT technique achieves better naturalness than both speaker-adaptively trained l...
متن کاملImprovement of prosodic characteristic in Vietnamese speech synthesis system base on HMM
The key factors helping people to understand the synthesized voices of text-to-speech system are the naturalness and the intelligibility. However, making more natural voices remains a difficult task because of the speech data’s scarcity. With data limited corpus, prosodic information such as tone, intonation, Part-of-Speech is added to ensure the quality of synthetic speech. In the paper, we in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016